false
Catalog
ASGE International Sampler (On-Demand) | 2024
Principles of machine learning - Basics Part 1 - C ...
Principles of machine learning - Basics Part 1 - Computer vision, NLP
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
Hello, this is Dr. Tyler Burson at Beth Israel Deaconess and Harvard Medical School. I'm very happy to join you today to continue our discussion of machine learning in gastroenterology. And for the next 15 minutes, the focus will be on the terms computer vision and natural language processing, which are core capabilities of AI and machine learning. So over the course of this series, you'll hear introductions to a variety of AI applications, which range from computer vision, which is what we're using now for polyp detection, to drug discovery, robotics, natural language processing, and voice. And we'll spend some time dissecting several of these terms so we can see the immediate relevance to our field. The reason that there is so much recent focus on AI in and outside of medicine is that something very unique has happened in the last 10 years or so. And that is for very specific tasks, whether that's recognizing faces and photographs, or playing chess or driving a car, for very specific tasks, AI has started to match and exceed human performance. So it's a very exciting moment for us to be able to think creatively as physicians about how some of these capabilities could be applied for improving patient care. We'll start with an exploration of computer vision, and we'll use this as an example of trying to understand traditional programming, machine learning, and deep learning. So a classic image challenge in computer vision is this dog or food challenge. You can take a look at these images as a human being, and while you can recognize that there is some similarity between the chihuahua image or the blueberry muffin, or the golden doodle and the fried chicken, the truth is, as a human being, you don't struggle over distinguishing these images. In fact, this is because you have millions of years of visual evolution built into your brains to be able to recognize subtle things in your environment and to distinguish between them. But for a computer, these are actually really difficult challenges, and they're a good example of how this problem can be tackled in a variety of ways. So let's think first about the traditional programming approach to tackle the dog versus food challenge. Here, a computer programmer would have to basically code a whole bunch of rules, if then rules, 100 lines of code, 300, 1,000 lines of code, and then they would run that code through the images to see the performance, and often that might get 60 or 70% performance. You'd try to figure out where the flaws were or where the errors were, and then you'd write another 1,000 lines of code to try to see if you can get to a higher level of performance. And for very complex visual images, this really has some major limitations, and you just can't quite get to human performance by just trying to write down rules. This has been tried for polyp detection, even predating AI applications for polyp detection. This was an approach used in 2003, where several gastroenterologists got together with several computer programmers and had a discussion about what do polyps look like. They're round, or they shine in this way, or they're more red than the surrounding mucosa, and the programmers would write down these rules, and with a heck of a lot of effort, they got to some level of accuracy, but the system was only usable on still images versus video, and it was an image classifier. It could tell that there was a polyp somewhere in the image, but it couldn't tell you exactly where. The machine learning approach really amps up our ability to do image recognition, and the way machine learning works is, in this case, the human or programmer works with a pre-made algorithm, something that is already designed to interpret the visual world, a computer vision algorithm, and that algorithm is then exposed to labeled data, images that are labeled dog or muffin, and if you expose it to enough labels and enough images, 1,000, 5,000, you develop what's called a model. This is a trained algorithm that can now distinguish between dog and not dog, and the human being did not have to write 1,000 lines of code for the algorithm to learn. Now, the subtlety and the big leap between machine learning and deep learning is what I did not tell you about traditional machine learning, is that the human still provides some guidance about the critical features of the image. There's some guidance in terms of saying, well, you should focus on the eyes or the ears or the snout of the dog, and the traditional machine learning requires some help from the human to focus on certain features. Where deep learning really changes things is that now the computer figures out the critical features of the image on its own, and that can be a good thing or a bad thing in certain ways. It can recognize features in an image that a human may never be able to recognize, and in medicine, that may be very powerful for exploring patterns that we cannot see or recognize visually. The downside is that many current deep learning systems cannot tell you, cannot communicate around the features that were identified. So if you have a deep learning system that recognizes a profoundly interesting pattern in a set of images, a set of x-rays, a set of endoscopies, you can't necessarily query the system to see what could be learned from what it's done. This is the black box problem of deep learning, is that we may not necessarily always be able to learn what pattern it was recognizing. Now, there are a few different ways that computer vision can recognize findings on your endoscopy monitor. I showed you in the early days of polyp detection that classification was one approach, and that basically makes a mark on the screen to say, hey, there's an image of a cat somewhere. This image contains a cat, or this image contains a polyp. What we're more interested in is localization, finding where that object is and drawing a bounding box around it. The step beyond that is object detection, where multiple different objects are all outlined in the image and assigned a class and have bounding boxes. And then the last category is called instance segmentation or semantic segmentation, and this classifies every single pixel. So the result is that the object of interest has a very specific outline drawn around every single pixel relevant. So what would this look like for GI endoscopy? Well, a polyp classification system would just say, hey, somewhere in this image, there's a polyp. Here's a yellow box around the corner to say there's a polyp somewhere. We're familiar with localization. This is where the polyp is located in the image. Object detection may help us recognize many things other than just polyps. And instance segmentation may not be helpful for polyps, but it could potentially be helpful, for instance, for Barrett's dysplasia, where there's a very specific outline of the area where dysplasia is seen. I think we're going to see all versions of this in future computer vision programs that we use in GI endoscopy. I won't go through all of the potential clinical applications for computer vision and GI endoscopy, but we're at the very, very beginning of a very long path. We're just at the moment of employing computer-aided detection for polyps. Very soon, we'll be seeing computer-aided diagnosis that tells us what the polyp is. And over time, I think we're going to see semi-automated endoscopy reports that label and describe the findings of the endoscopy report with relatively little human intervention. And some of the most exciting areas I think are going to be using deep learning to recognize patterns of endoscopic findings that humans can't recognize. An example of that would be looking at a patient who has ulcerative colitis, taking a certain medication, and being able to recognize if that patient is at risk for flaring. Not are they flaring, but are they at risk for flaring by looking at microscopic patterns that are beyond what a human eye can recognize. And this type of exploration I think is going to be incredibly powerful, and I think will propel very novel insights for GI endoscopy. Let's now turn our attention to natural language processing and speech recognition. The broadest term, natural language processing, basically is a form of machine learning which allows us to process and analyze free text, dictated speech, and unstructured data. And in medicine, there are really going to be two critical use cases. One is going to be comprehending human speech and extracting meaning from it. And the second piece will be unlocking unstructured data out of databases, documents, EHRs, and mapping those to real concepts to create structure that allows us to do very powerful analytics. So let's just break these terms apart very specifically. So speech recognition, which can involve AI underpinnings, simply refers to a technology that processes speech effectively word for word. So as you're dictating, you'd say this is a 59-year-old male presenting with three hours of pain, period, next paragraph, and it might dictate every one of those words that you say. Natural language processing involves that text we might create from a word-for-word dictation or also office notes or outside records or even patient-generated information. And it processes that to organize some of the key concepts and can convert those key concepts into very structured data. And it's this structured data that we so need for high-level analysis. An example of a very exciting technology that combines speech recognition and natural language processing is this concept of ambient clinical intelligence. So at the very first level, a physician is meeting with her patient in the room, and word for word, the entire interaction is just recorded. So that's just speech recognition. Where ambient clinical intelligence becomes exciting is that it can take that free-form dialogue, integrate it with the background data from the EHR and the physician's practice style, and generate an organized medical note. And you can see what that looks like. We start off on the right side of the image with a transcript of the word-for-word discussion with the physician. And on the left is a simplified bullet-pointed history that is created by the computer's analysis of that transcript. And as you scroll through the bullet-pointed history, you can pick out any word, and it will immediately take you to the transcript, the actual audio of that visit, so you can clarify if there are any uncertainties. So this is effectively like having an AI scribe that has some knowledge of medicine and can summarize the visit. And the real hope and excitement here is to be able to just have a natural interaction with the patient, make the recommendations, and walk out of the room knowing that a medical note has effectively been finished, or nearly so, without you having to be hunched over the computer. Natural language processing has an incredible promise for analyzing our own practices and measuring and reporting GI quality metrics. A good example of this was published in 2020, where physicians did a manual review of their colonoscopy reports looking at polyp detection rate, adenoma detection rate, failed sacral intubation, and a variety of other parameters. And then they used a natural language processing program to do the same. And this was an intentionally hard challenge because this was a system, a medical hospital system, that used several different EHRs, and the natural language processing had to recognize some scanned-in documents using something called OCR, optical character recognition. So this is effectively looking at a scanned-in report. And taking all of that together, natural language processing tools were able to get results that were extremely similar to expert physicians doing a manual chart review. And I think the hope of this is really best captured by this sentence, which is that the authors described that their natural language processing algorithm took under 30 minutes to extract data on all the colonoscopies ever done at their institutions since the introduction of EHRs, versus with manual data collection, it took 160 man-hours just to annotate 600 patients. And I think we're all familiar with the grinding, difficult work of trying to extract quality metrics from our own practices. And natural language processing really, I think, is going to be an important unlock for us to measure and report our own quality measures much, much more easily. You can imagine walking into your office and saying, Alexa, show me my ADR trend for my screening colonoscopies during the last two years, upload to Quality Registry, or hey, Google, show me a graph of adenomas for colonoscopy at our endoscopy center during the recent GI Genius and Endocuff trials from January to April 2022. The math problems of those questions are not hard, but the context and extracting all the data is the hard part. And natural language processing will certainly aid us with these steps. A complexity and major barrier for us is going to remain that many of us use multiple EHRs that are not fully integrated. And while natural language processing will help with some of this, the actual integration of different EHRs and sending data back and forth is still a major hiccup and still requires some human intervention and coordinated effort for us to become more integrated so our data is not residing in separate silos. I'll finish by taking this quote from the science fiction writer, Arthur Clark. He said that any sufficiently advanced technology is indistinguishable from magic. And I think we can often feel that when we're using some of the devices that we use in our everyday life that are underpinned by AI. But I think we also have to be honest as physicians and ask ourselves, has a digital revolution actually been magical for us in the way it promised to be when digital interventions were introduced 20 years ago? And I think most of us would agree that the result of this digital transformation of healthcare is that often we're hunched over computers effectively as digital scribes translating the analog world into the computer EHR system and that has not led to a great deal of physician satisfaction. With AI, I think we have to be very alert to what has happened in the last 20 years and I think it's critical that physicians are on the forefront of stating our needs, stating our patients' needs and making sure that the innovation in AI is driven by these needs. And if we do that, I think AI in medicine represents at least an opportunity to really improve clinical insight by leveraging data more powerfully, potentially and really importantly to decrease the fast and shallow repetitive work, the data entry that we spend a lot of our times doing, and hopefully give us more space for innovation, for creativity, for empathy, the types of things that got us into medicine in the first place and which ultimately are irreplaceable by a computer. So I will end there and thank you for your attention.
Video Summary
Dr. Tyler Burson from Beth Israel Deaconess and Harvard Medical School discusses machine learning in gastroenterology in a video. He explains how computer vision and natural language processing are essential to AI and machine learning, detailing the challenges faced by computers in image recognition tasks like distinguishing between dogs and food. He compares traditional programming methods with machine learning and deep learning approaches, highlighting the shift towards deep learning's ability to identify critical image features independently. Dr. Burson also explores the potential applications of computer vision in GI endoscopy, such as polyp detection and classification techniques. Additionally, he delves into the role of natural language processing in automating medical documentation and quality metric extraction, emphasizing the benefits it offers in streamlining healthcare practices. Finally, he discusses the importance of physicians driving AI innovation to enhance patient care while maintaining the human elements of medicine.
Asset Subtitle
Tyler Berzin MD, MS, FASGE
Keywords
machine learning
gastroenterology
computer vision
natural language processing
AI in healthcare
×
Please select your language
1
English